Take Home Ex 3

Author

Aruiana

Published

February 5, 2023

Modified

February 15, 2023

1. The Task

To uncover the salient patterns of the resale prices of public housing property by residential towns and estates in Singapore by using appropriate analytical visualisation techniques l

For the purpose of this study, the focus in on 3-ROOM, 4-ROOM and 5-ROOM types in 2022.

2. Data Preparation

Step 1: Load Packages

Show
pacman::p_load(tidyverse, plotly, crosstalk, DT, ggdist, gganimate, ggstatsplot, heatmaply)

Step 2: Import Data

Show
#import data
HDB <- read_csv(("data/HDB.csv"))
Rows: 146338 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): month, town, flat_type, block, street_name, storey_range, flat_mode...
dbl (3): floor_area_sqm, lease_commence_date, resale_price

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Step3: Filter Data for the study

Filter out the data required:
1. Room Type
2. Year 2022

Show
#Filter 3Room, 4Room, 5Room, Filter 2022, Convert remaining lease into years
HDBRoom <- HDB %>% filter(flat_type=="3 ROOM" | flat_type=="4 ROOM" | flat_type=="5 ROOM") %>%
  separate(month, into = c("year", "month")) %>% 
  filter(year == "2022") %>%
  separate(remaining_lease, into = c("rmlease_years", "rmlease_month"), sep = "years") 

Step 4: Amend Data Set

1. Convert the Month from Character to Number
2. Convert Remaining lease from Character to Number
3. Re-categorise towns into regions
4. Sort Storey Range by smallest to largest
5. Create new dataset for price/sqm

Show
#Convert Month from Chr to number
HDBRoom$month <- as.numeric(HDBRoom$month)

#Convert Remaining lease into numeric years in decimal
HDBRoom$rmlease_years <- as.numeric(HDBRoom$rmlease_years)

HDBRoom$rmlease_month <- gsub("[monthsmonth]", " ", HDBRoom$rmlease_month) %>%
  as.numeric(HDBRoom$rmlease_month) / 12 

HDBRoom$rmlease_month[is.na(HDBRoom$rmlease_month)] = 0

HDBRoom$rmlease <- as.numeric(HDBRoom$rmlease_years + HDBRoom$rmlease_month)

#Group Towns into Regions
HDBRoom$region <- case_when(
  HDBRoom$town %in% c("ANG MO KIO", "HOUGANG", "PUNGGOL", "SERANGOON", "SENGKANG") ~ "North-East",
    HDBRoom$town %in% c("BISHAN", "BUKIT MERAH", "BUKIT TIMAH", "CENTRAL AREA", "GEYLANG", "KALLANG/WHAMPOA", "MARINE PARADE", "QUEENSTOWN", "TOA PAYOH") ~ "Central",
    HDBRoom$town %in% c("BEDOK", "PASIR RIS", "TAMPINES") ~ "East",
    HDBRoom$town %in% c("SEMBAWANG", "WOODLANDS", "YISHUN") ~ "North",
    HDBRoom$town %in% c("BUKIT BATOK", "BUKIT PANJANG", "CHOA CHU KANG", "CLEMENTI", "JURONG EAST", "JURONG WEST") ~ "West")

#Edit storey range and sort by smallest
HDBRoom$storey_range <- str_replace(HDBRoom$storey_range, "TO", "-")

sr_sort = c("01 - 03", "04 - 06", "07 - 09", "10 - 12", "13 - 15", "16 - 18", "19 - 21", "22 - 24","25 - 27","28 - 30", "31 - 33", "34 - 36", "37 - 39", "40 - 42", "43 - 45", "46 - 48", "49 - 51")

HDBRoom$storey_range <- factor (HDBRoom$storey_range, levels = sr_sort)

#Create additional data on price per sqm
HDBRoom$price_per_sqm <- (HDBRoom$resale_price / HDBRoom$floor_area_sqm)

Step 5: Select the relevant columns for analysis

Show
HDBDATA <- HDBRoom [,!names(HDBRoom) %in% c("year", "block", "street_name", "rmlease_years", "rmlease_month", "flat_model")]

3. Data Analytics

3.1 Histogram of dataset

Show
options(scipen = 999)

p1 <- gghistostats(
  data = HDBDATA, x = "rmlease",
  type = "bayes",
  test.value = 100,
  xlab = "Resale Property remaining lease"
  )

p2 <- gghistostats(
  data = HDBDATA, x = "month",
  type = "bayes",
  test.value = 100,
  xlab = "Month of Purchase"
  )

p3 <- gghistostats(
  data = HDBDATA, x = "resale_price",
  type = "bayes",
  test.value = 100,
  xlab = "Resale Price"
  )

p4 <- gghistostats(
  data = HDBDATA, x = "price_per_sqm",
  type = "bayes",
  test.value = 100,
  xlab = "Resale Price/sqm"
  )

p5 <- ggplot(
  data = HDBDATA, aes(x = town, y=rmlease, colour = flat_type)) + geom_point() + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) + 
    labs( x = "Resale by Town")

p6 <- ggplot(
  data = HDBDATA, aes(x = storey_range, fill = flat_type)) + geom_bar() + theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) + 
    labs( x = "Resale by Storey")

(p3 + p4) / (p1 + p2)

Show
p5

Show
p6

From the following histograms, we have the following findings:

  1. When we compare the Resale Price vs Resale Price/sqm, we notice that the Resale Price is flatter and more uneven as compared to the Resale Price/sqm, especially for the lower priced. The Resale Price/Sqm is more evenly distributed, even though it is right skewed. Thus, for further analysis, the Resale price/sqm data is preferred.
  2. The resale by remaining lease histogram is double-peaked. The largest number of resale have 94 years left in the lease. This is almost immediately after the HDB MOP of 5 years for a 99year lease. This group should be the newest HDB flats. The next “peak” is at 60 years. When compared to the resale by town, we can see that the these 90+yo flats mostly come from the newer towns like Sengkang and Punggol. Those with lesser remaining leases come from the more mature estates.
  3. We noted a trend that the older flats (with lesser remaining lease) tend to be smaller room type.
  4. The resale by storey is in line with the trend that Singapore do not have many flats above 21 storey.

3.2 Comparing prices by flat type

Show
ggbetweenstats(
  data = HDBDATA,
  x = flat_type, 
  y = price_per_sqm,
  type = "np",
  messages = FALSE
)

Using a violin plot, we compare the $/sqm by flat type. Thus, it can be see that the prices is driven by demand that the median price for a 3-room flat can be higher than a 5-room flat.

Thus, the prices of the flat may have a higher correlation to location (accessibility), demand for flats, affordability. 5-room flats which cost higher due to the area range may have a lower demand.

It can be noted that the Singapore housing rates are rather stable with all 3 flat types have very close median of only $100-$200 difference.

3.3 Resale Price by region

We can see that for the North, North-East and West, the histogram is rather uniformed. However, for East and Central, the prices are binomial.

Show
HDBDATA %>%
  
grouped_gghistostats(
  x                 = resale_price,
  test.value        = 50,
  type              = "nonparametric",
  grouping.var      = region,
  normal.curve      = TRUE,
  normal.curve.args = list(color = "red", size = 1),
  ggtheme           = ggthemes::theme_tufte(),
  ## modify the defaults from `{ggstatsplot}` for each plot
  plotgrid.args     = list(nrow = 2),
  annotation.args   = list(title = "Resale price by region")
)

3.4 Comparing prices per floor area

We can see that there is an extremely high correlation between resale price and floor area.

Show
ggscatterstats(
  data = HDBDATA,
  x = resale_price,
  y = floor_area_sqm,
  marginal = FALSE,
  )

3.5 Comparing prices per floor area

Show
scdata <- highlight_key(HDBDATA) 
  
sc1 <- ggplot(data = scdata, aes(x = town, y = resale_price, colour = region)) + geom_point() + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) + scale_y_continuous(breaks = c(200000,500000,1000000,150000)) 

sc2 <- ggplot(data = scdata, aes(x = town, y = price_per_sqm, colour = region)) + geom_point() + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1), legend.position='none') + 
    scale_y_continuous(breaks = c(3000,6000,9000,12000,15000)) +
    labs(title = "Price vs Price per sqm")

subplot(ggplotly(sc1), ggplotly(sc2))

When compared, we can see that the distribution of price vs price/sqm is very different. Even though there is a huge jump in price (e.g Queenstown) where there appears to be an “exception”, the price/sqm does not show that jump. Thus, using price/sqm is a better comparision.

However, we do notice that when we look at the highest priced property in Clementi, we noticed that it is actually not the highest priced/sqm. This goes to show that even though area is a huge factor, there maybe other reasons which will still slightly affect the price of a property.

3.6 Comparing prices per location

Show
HDBDATA %>%
  mutate(class = fct_reorder(town, price_per_sqm, .fun="mean")) %>%
  ggplot(aes(y =reorder(town, price_per_sqm),
           x = price_per_sqm, fill = region)) + 
  geom_boxplot() + stat_summary(fun.y=mean, geom = "point", colour="yellow") +
  labs(x = "Price per sqm", y = "Town", title = "Price per sqm by town")
Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
ℹ Please use the `fun` argument instead.

The data generated is not surprising with Central region dominating the price. However, we can tell that Clementi is also a popular area despite being in the West. It is also interesting to note that for old estates like Bishan and Ang Mo Kio, the prices are of the extreme ends with many 1 off exceptions. This was also mentioned in the news for Bishan being sold at a million dollars1.

3.7 Comparing prices per location

Show
HDBDATA %>% 
  group_by(region) %>%
  mutate(class = fct_reorder(region, price_per_sqm, .fun="mean")) %>%
  ggplot(mapping = aes(y = flat_type, x = price_per_sqm)) +
  # Make grouped boxplot
  geom_boxplot(aes(fill = as.factor(region))) +
  theme(legend.position = "top") +
  # Adjust lables and add title
  labs(title = "HDB resale prices in 2022 by region", y="Flat Type", x = "Price per sqm", fill = "flat_type")

Using this boxplot, we can tell that there is a huge price gap by location with Central far off from the rest of the regions for only 5 room and 4 room flats. For the rest of the regions, the rankings are rather similar with North-East being the next most expensive and the West being the cheapest.

For 3 room flats, there is not much price discrepancies.

3.8 Price by Storey

Show
floorheatmap <-
  HDBDATA %>%
  group_by(town, storey_range) %>%
  summarise(median_price = median(price_per_sqm))
`summarise()` has grouped output by 'town'. You can override using the
`.groups` argument.
Show
heatmap <- ggplot(data = floorheatmap, 
                  mapping = aes(x = town, y = storey_range, fill = median_price)) +
            geom_tile() +
  labs(title = "Heatmap of HDB breakdown by area and storey", x = "Town", y = "Storey") +
  scale_fill_gradient(name = "Median Resale Price/sqm",
                      low = "peachpuff",
                      high = "deeppink4")+
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1))

heatmap

With this heatmap, we can confirm that the higher the storey leve, the higher the price. Jurong West and Geylang are the only exceptions where there is property which are cheaper for a lower level. These could be exceptions where other factors such as number of years on remaining lease comes into play. Referencing to the graph in 3.1, the property in Geylang and Jurong have varied aged property.

3.9 Property Price over time in 2022

Show
a <-
ggplot(HDBDATA, aes(x = rmlease, y = resale_price, 
                      size = floor_area_sqm, 
                      colour = region)) +
  geom_point(alpha = 0.7, 
             show.legend = FALSE) +
  scale_size(range = c(2, 12)) +
  labs(title = '2022: {as.integer(frame_time)} Month', 
       x = 'Remaining Lease', 
       y = 'Resale Price') +
  transition_time(month) +       #<<
  ease_aes('linear')            #<<

a

From this animated graph over the cross of 12 months, we can tell that there is no hugh fluctuations in property prices by the price for each region is relatively stable with the exception of Central whose prices should be driven other factors.

Legend:
Size of dot: Size of Property
Colour: by Region (Consistent with graphs above)

Conclusion

Singapore’s property prices are rather stable. This can be seen as the price over time and price/sqm did not have huge fluctuation or price differences.

Property prices are highly affected by storey and price differences are mainly due to location.

Footnotes

  1. https://www.straitstimes.com/singapore/housing/five-room-hdb-dbss-flat-in-bishan-sold-for-record-1295-million-three-weeks-after↩︎